Knowledge Extraction from Semi-structured Data Based on Fuzzy Techniques

نویسندگان

  • Paolo Ceravolo
  • Maria Cristina Nocerino
  • Marco Viviani
چکیده

In this work we propose a fuzzy technique to compare XML documents belonging to a semi-structured flow and sharing a common vocabulary of tags. Our approach is based on the idea of representing documents as fuzzy bags and, using a measure of comparison, evaluating structural similarities between them. Then we suggest how to organize the extracted knowledge in a class hierarchy, choosing a technique related to the domain of interest, later to be converted into a user ontology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy Approach for Pertinent Information Extraction from Web Resources

Recent work in machine learning for information extraction has focused on two distinct sub-problems: the conventional problem of filling template slots from natural language text, and the problem of wrapper induction, learning simple extraction procedures (“wrappers”) for highly structured text such as Web pages. For suitable regular domains, existing wrapper induction algorithms can efficientl...

متن کامل

OntoExtractor: A Fuzzy-Based Approach to Content and Structure-Based Metadata Extraction

This paper describes OntoExtractor a tool for extracting metadata from heterogeneous sources of information, producing a “quick-and-dirty” hierarchy of knowledge. This tool is specifically tailored for a quick classification of semi-structured data. By this feature, OntoExtractor is convenient for dealing with a web-based data source.

متن کامل

Semi-Structured Data Extraction from Heterogeneous Sources

This paper concerns the extraction of semi-structured data from Web pages generated from multiple on-line services. This task is addressed by representing the schemas for semi-structured data and crafting generic wrappers based on the schemas. We introduce a hybrid representation method for schemas of semi-structured data, consisting of a concept hierarchy and a set of knowledge unit frames. A ...

متن کامل

Agricultural Knowledge Discovery from Semi-Structured Text

This research aims to develop automatic knowledge discovery system from semi-structured Thai text for supporting plant diagnosis. Plant disease diagnosis is very important for farmers to be able to cure infected plants before infections become more severe. Prior to diagnosis, farmers need to gain knowledge retrieved primarily from text, including unstructured and semi-structured document. As th...

متن کامل

Identifying Effective Knowledge Management Components and Indica-tors in Iranian Metropolitan Municipalities Using Fuzzy Delphi Tech-nique

Objective: The purpose was to identify components and effective characteristics of knowledge management of metropolitan municipalities in Iran. Methods: Applied qualitative fuzzy Delphi analysis method was incorporated.  The qualitative experts in the community in the field: Urban management, and executives in the municipalities. Twenty experts surveyed purposefully to generate a semi-structur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004